Comparing top-k XML lists

نویسندگان

Ramakrishna Varadarajan

Fernando Farfán

Vagelis Hristidis

چکیده

Systems that produce ranked lists of results are abundant. For instance, Web search engines return ranked lists of Web pages. There has been work on distance measure for list permutations, like Kendall tau and Spearman’s Footrule, as well as extensions to handle top-k lists, which are more common in practice. In addition to ranking whole objects (e.g., Web pages), there is an increasing number of systems that provide keyword search on XML or other semistructured data, and produce ranked lists of XML sub-trees. Unfortunately, previous distance measures are not suitable for ranked lists of sub-trees since they do not account for the possible overlap between the returned sub-trees. That is, two sub-trees differing by a single node would be considered separate objects. In this paper, we present the first distance measures for ranked lists of sub-trees, and show under what conditions these measures are metrics. Furthermore, we present algorithms to efficiently compute these distance measures. Finally, we evaluate and compare the proposed measures on real data using three popular XML keyword proximity search systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Clustered Index Approach to Distributed XPath

Supporting top-k queries over distributed collections of schemaless XML data poses two challenges. While XML supports expressive query languages such as XPath and XQuery, these languages require schema knowledge so as to write an appropriate query which may not be available in distributed systems with autonomous and dynamic sources. Thus, there is a need for approximate query processing. Furthe...

متن کامل

Efficient query processing and index tuning using proximity scores

In the presence of growing data, the need for efficient query processing under result quality and index size control becomes more and more a challenge to search engines. We show how to use proximity scores to make query processing effective and efficient with focus on either of the optimization goals. More precisely, we make the following contributions: • We present a comprehensive comparative ...

متن کامل

An Efficient and Versatile Query Engine for TopX Search

This paper presents a novel engine, coined TopX, for efficient ranked retrieval of XML documents over semistructured but nonschematic data collections. The algorithm follows the paradigm of threshold algorithms for top-k query processing with a focus on inexpensive sequential accesses to index lists and only a few judiciously scheduled random accesses. The difficulties in applying the existing ...

متن کامل

On the Integration of Structure Indexes and Inverted Lists

Several methods have been proposed to evaluate queries over a native XML DBMS, where the queries specify both path and keyword constraints. These broadly consist of graph traversal approaches, optimized with auxiliary structures known as structure indexes; and approaches based on information-retrieval style inverted lists. However, no published literature addresses methods of combining structur...

متن کامل

The RankGroup Join Algorithm: Top-k Query Processing in XML Datasets

This project investigates top-k queries in XML datasets. We propose a syntactical addition to XQuery to accommodate top-k XML queries. We then propose a 3-step process to realize these top-k XML queries using a relational database and a new join operator, RankGroup. Our preliminary implementation shows promise in dramatically reducing the running time and number of tuples accessed during such q...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Inf. Syst.

دوره 38 شماره

صفحات -

تاریخ انتشار 2013

Comparing top-k XML lists

نویسندگان

چکیده

منابع مشابه

A Clustered Index Approach to Distributed XPath

Efficient query processing and index tuning using proximity scores

An Efficient and Versatile Query Engine for TopX Search

On the Integration of Structure Indexes and Inverted Lists

The RankGroup Join Algorithm: Top-k Query Processing in XML Datasets

عنوان ژورنال:

اشتراک گذاری